Where Should Saliency Models Look Next?
نویسندگان
چکیده
Recently, large breakthroughs have been observed in saliency modeling. The top scores on saliency benchmarks have become dominated by neural network models of saliency, and some evaluation scores have begun to saturate. Large jumps in performance relative to previous models can be found across datasets, image types, and evaluation metrics. Have saliency models begun to converge on human performance? In this paper, we re-examine the current state-of-the-art using a finegrained analysis on image types, individual images, and image regions. Using experiments to gather annotations for high-density regions of human eye fixations on images in two established saliency datasets, MIT300 and CAT2000, we quantify up to 60% of the remaining errors of saliency models. We argue that to continue to approach human-level performance, saliency models will need to discover higher-level concepts in images: text, objects of gaze and action, locations of motion, and expected locations of people in images. Moreover, they will need to reason about the relative importance of image regions, such as focusing on the most important person in the room or the most informative sign on the road. More accurately tracking performance will require finer-grained evaluations and metrics. Pushing performance further will require higher-level image understanding.
منابع مشابه
Compressed-Sampling-Based Image Saliency Detection in the Wavelet Domain
When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...
متن کاملCAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research
Saliency modeling has been an active research area in computer vision for about two decades. Existing state of the art models perform very well in predicting where people look in natural scenes. There is, however, the risk that these models may have been overfitting themselves to available small scale biased datasets, thus trapping the progress in a local minimum. To gain a deeper insight regar...
متن کاملLearning to predict where to look in interactive environments using deep recurrent q-learning
Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent h...
متن کاملSUPPLEMENTAL MATERIAL : Where should saliency models look next ?
In Fig. 1 we include the performances of the top four neural network models evaluated on the MIT300 dataset (as of March 2016), along with the top three non neural network models, and three traditional bottom-up approaches that are commonly used for saliency comparisons. The metrics reported are ones evaluated on the MIT Saliency Benchmark [1], supplemented with information gain (as recommended...
متن کاملWhat can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition.
Saliency map models account for a small but significant amount of the variance in where people fixate, but evaluating these models with natural stimuli has led to mixed results. In the present study, the eye movements of participants were recorded while they viewed color photographs of natural scenes in preparation for a memory test (encoding) and when recognizing them later. These eye movement...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016